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1. INTRODUCTION 

The current situation regarding the Coronavirus 2019 has made us realize that public health 
problems are an important problem of the country. Especially the problem of doctors, those are not sufficient 
to support the demand of the number of patients. Thailand has a ratio of 1 registered nurse per population of 
412 people [1]. Therefore, the government has a policy that emphasizes the participation of the people in 
taking care of their own health [2], which diagnose themselves about the risks of different diseases to reduce 
the problem of seeing a doctor. 

Self-diagnosis is a concept of observing their own abnormalities, that reduce the risk of disease. Due 
to the current situation, social conditions, people work hard and they do not have time to go for a medical 
examination. People will go to the hospital only when they really get a serious health problem. The current 
situation makes the doctor's work is overload that there is no time to serve normal patients. Self-diagnosis is 
an option to reduce health problems and also choose to use public health services instead. Currently, the 
concept is applied machine learning to assist in self-diagnosis [3]-[6]. 

From the Figure 1, the researcher has the concept of self-diagnosis design. The concept has divided 
data into structured data and semi-structured data. Structured data include: preliminary values measured from 
body characteristics such as blood pressure values and heartbeat values, which is employed in screen patients 
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between normal group and surveillance group [7]. The unstructured data is a diagnostic based on the patient's 


medical history. It has been separate into a group of symptoms. We use this data to generate bipartite graph 
[8]-[11] consisting of two classes as in Figure 1. 
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Figure 1. Self-diagnosis concept 
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From Figure 2, it consist of two classes, the patient class represented by P and the symptom class 
represented by S. The experiment showed that there are too many sets of class. The graph was complicated, and 
made inefficient processing. Therefore, The symptoms were grouped using the sliding window method [12], 
[13], and created a graph again. There are 10 graphs divided by similarity 0% to 100%. The challenge of this 
paper was selected appropriate model from 10 graphs. 


Figure 2. Bipartite graph create from unstructured data 


The concept of graph density has been applied to this research include: the Kappas method, and the 
concept of multiple line graph to find the most appropriate graph model for further disease prediction. 
This paper proposes two graph density formulas that applied to bipartite graphs. Which is applied from the 
concept of finding the density of the graph and the Kappas method. In addition, Multiple line graph theory is 
added to assist confirm the results. 


2. RESEARCH METHOD 

In this section, we discuss the theory that applied to find the most appropriate graph model for 
further disease prediction, including graph density, the Kappas and multiple line graph. Sections 2.1 and 2.2 
discuss the concept of graph density, but in different methods. Section 2.3 uses the concept of Multiple line 
graph to encourage the results. This research is divided into 3 stages of experiment design. The first stage is 
the graph density test. The second stage is to calculate the graph desity as well, but different algorithms. 
The third stage applied the multiple line graph theory. 


2.1. Graph density 

Graph density is a mathematical concept to measure the density of a graph. It is a calculate of the 
ratio between the number of edges to the total number of possible edgesThe concept of graph density is used 
to determine the communication and connection between nodes. Which can compare more and less densities. 
A large value represents the number of connections between the large number of nodes which means that 
nodes have multiple paths to connect to each other [14]-[17]. Smaller values mean that the less number of 
lines, making communication with less choice of routes. The (1) for calculating the density of graph in non- 
directional graph as: 
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— D refer to the density of the graph. 

—  |E| refer to the number of actual edges. 

— [|V] refer to the number of nodes. 
The density value is a minimum of 0, which means that there are no edges in the graph at all. The highest 
density value is 1, i.e., the number of lines is equal to the total number of possible edges. Presently, there are 
many aspects of the graph density problem that are still being developed and divided into several theories. 
Both in terms of selecting a cluster and determining the importance of node relationships [18]-[20]. 

We applied the above theory to find the appropriate graph density. It calculates from the actual edge 

to all possible edges as shown in (2). 


P Number of Edges 
Density = £ 


Number of possible edges 2) 
The number of Edges means the total number of edges in the graph and the number of possible edges means the 
maximum number of edges possible. The value of maximum of possible edges is obtained by multiplying the 
number of nodes of two sets in the bipartite graph. The maximum value of density is 1, meaning the number of 
edges equal to the number of possible edges, and the lowest value is 0, meaning no edge at all, and once the 
density is obtained, the best approximate value is 0.5, which is the middle value between 0 and 1. The concept 
of graph density determination allows to select an appropriate graph which selects from values close to 0.5. 


2.2. The Kappas 

The Kappas [21], [22] is a development in the concept of graph density to determine the strength of 
node segmentation to find a group of node divisions, the concept uses global connection determination and 
then compares it with local connection. The method is to find the ratio value called Intra, which is the mean 
of Local connections, and K inter for global averaging. The Kappas method is based on the general graph 
density method as the (3). 


= |E| 
K = 0.5 x N(N-1) (3) 
—  K Refer to the density of graph 
—  |E|refer to number of edges in the graph 
— N refer to number of nodes 
It has adjusted the formula for calculating local as a (4). 


z 1l 1al lEjl 
Kina = YL (k) = 5L, —5l 
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(4) 
—  K Refer to the density of graph of intra 
—  |E| refer to number of edges in the graph intra 
— nrefer to number of connected nodes 

The formula for calculating global connection as (5). 
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(5) 


— K Refer to the density of graph of inter 
—  |E| refer to number of edges in the graph 
— nrefer to number of connected nodes 
The concept of calculating Kappas value by comparing the values between graph global connection 
and node local connection. In this research, the formula has been adjusted to our research as (6). 
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., __ Number of possible edges 

Kappa Density = — (esNiane. (6) 

— By N1 refers to Number of Patients 

— N2refers to Number of Symptom 

The maximum value is 2, meaning the number of edges equal to the number of possible edges. The best 
value is 1, which means the middle value between 0 and 2. The point that differs from the density 
determination in Section 2.1 is that this formula is more elaborate because the density is calculated from the 
external and internal of the node. This formula is used in emphasizing with Section 2.1. If the experimental 
results are in the same direction, the results are considered reliable. 


2.3. Multiple line graph 

A line graph is a graph that contains two or more data points that connect by line. These explain the 
relationship between two axes of the sgraph. A line graph the capability to show data variable and trends, 
which lead to prediction results of data. 

A multiple line graph is a two or more lines. It used for comparison between the lines, viewing 
trends, and relation between the lines [23]-[27]. A multiple line graph is a type of graph that shows two or 
more variables changing at the same time. Multiple line graph example as Figure 3. 
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Figure 3. Multiple graphs showing monthly sales 


From Figure 3, it shows a multiple line graph of the sales of 3 products from January to May. It is 
able to analyze trends that the three products have trends in the same direction with each other, and the 
second product has sales that fall between the 1‘ and 3“ products. This research is a continuation of study [7] 
on applying sliding windows [12] to group symptoms. The results of the study were to create 10 graphs 
grouped by similarity, in order from the level of similarity at 10%-100%. The graph is characteristic of the 
Bipartite graph as Figure 2, on the left side are a patient sample class from P1-P36 and the other are the 
disease group from N1-N22. 

From the Figure 4, The graph was created of the 100% similarity level of the symptoms group, the 
number of classes N has not been grouped. From the Figure 5, the graph was created of the 50% similarity 
level, the number of N classes is reduced from 22 to 12, because of the grouping of N classes. In Figure 5, the 
number of class N is one at 20% similarity. 

From the Figures 4 to 6, the class n tends to decrease from 100% to 20% similarity. This research 
aims to select one graph that the proper model to apply self diagnosis. The appropriate graph that makes the 
model fit, it's not overly complicated and the number of classes is not too small to be unusable. Therefore, the 
concept of graph density was applied to select the appropriate graph. 

This research applied the hypothesis multiple line graph, where the X-axis represents patient classes 
from P1-P36. The Y-axis is the number of edges exiting each node. The assessment method is based on the 
graph trend, which is a multiple line graph. The appropriate values from the graph where the optimal value is 
the line in the middle of every line. From Figure 7, it is shown that P2 is in the middle between the lines P1 
and P3. The observation point is that P2 value is no more and no less than P1 and P3. So that means that the 
value of P2 is the appropriate value. 
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Figure 6. Graph at 20% similarity 
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Figure 7. Multiple line graph showing optimal value example 
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3. RESULTS AND DISCUSSION 
3.1. Experimental results from graph density and Kappa method 

This section presents the experimental results of the three methods to find the appropriate model out 
of 10 different models. The Table 1 shows the graph density results and the Kappas calculations. The density 
result is the value calculated in (2), while the Kappas are the values calculated as shown in (6). 


Table 1. Density graph and the Kappas experiment results 


Percent No class (N) No class (P) Edge Density distance to 0.5 The Kappas 
100% 21 36 130 0.171958 0.328042 0.343915 
90% 21 36 130 0.171958 0.328042 0.343915 
80% 20 36 130 0.180556 0.319444 0.361111 
70% 18 36 130 0.200617 0.299383 0.401235 
60% 16 36 130 0.225694 0.274306 0.451389 
50% 13 36 127 0.271368 0.228632 0.542735 
40% 8 36 122 0.423611 0.076389 0.847222 
30% 2 36 60 0.833333 0.333333 1.666667 
20% 1 36 36 1 0.5 2 
10% 1 36 36 1 0.5 2 


From the Table 1 shown, the level of similarity number of classes N; the number of classes P, 
number of Edge, and density values. The 5" column shows the density value, and the 6" column shows the 
distance from 0.5, which is considers the middle value between 0 and 1. The result is shown that the density 
value of 40% similarity is the closest to the 0.5 with only 0.076389. The 7 column in Table 1 shows the 
Kappas value, and closest to 1 was 0.847222 at a level of 40% similarity. 


3.2. Select model with multiple line graph 

From the concept in section 3.3, the X-axis represents the patient class from P1-P36. The Y-axis is 
the number of edges each node. The graph shows the level of similarity model from level 100 to level 10, 
where some levels may be equal to others. For example, the degree of similarity at 100 is equal to 90 and 
thus is shown on the same line. The results are shown in the Figure 8. 
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Figure 8. Multiple line of patient class and number of edge 


Based on Figure 8, the blue line represents the P100 level as 100% similarity, the red at the P80 as 
80% similarity level, the green at the P70 level as 70% similarity, the purple at the P60 level as 60% 
similarity, the blue at the P50 level 50% similarity, the orange at the 40 level as 40% similarity, the dark blue 
at the P30 level as 30% similarity, and the red at the P20 level as 20% similarity. The appropriate values are 
found on the orange line, which is the level. at 40% similarity, where the value lies between the remaining 
lines. Therefore, these results support the density and Kappa values in selecting the appropriate model at 40% 
similarity. 
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4. CONCLUSION 

The self-diagnosis was designed by separating structure and unstructured data. The unstructured 
data is used to create medical terminology and lead to the creation of bipartite graphs, which have two 
classes: patient classes and symptoms classes. However, when creating the graph it is complicated to process. 
There are 10 graphs divided by similarity 0% to 100%. 

In this paper, we proposed a concept of graph density; the Kappas method and support with the 
concept of graph trend, which selecting an appropriate graph for self-diagnosis. The result from three 
methods indicate, the appropriate at 40% similarity. First method, the value from graph density at 40% 
similarity is 0.423611, which has the shortest distance from 0.5, Kappas provide value 0.847222 at 40% 
similarity, which is close to the value 1, and when we observe from graph trend, we find 40% similarity is in 
the middle of all lines. Therefore, the graph at 40% similarity was chosen as a model for self-diagnosis. From 
the experiment results, Appropriate model results were obtained at the 40% level. The reason why the results 
are closer to 0% is that there is not much difference between the 80%-100% levels. Therefore, the result is in 
the appropriate in the 40% level. 

The experiment was based on the assumption of selecting the most appropriate model from 10 
models for further processing. This paper presents the use of two graph density methods to select the desired 
model and the use of multiple line graph theory to assist in the selection. This research requires a model that 
is neither too complicated nor too simple for efficient processing. The experimental methods presented in this 
paper are effective at dividing complexity. In other researches that require to select models with different 
characteristics It is better to use alternative methods of selection that can display the exacted characteristics. 
The future of this work, we bring the selected model to be examined in other disease groups for validity to 
further confirm the model. 
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